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FRACTALS WITH POINT IMPACT IN FUNCTIONAL LINEAR 

REGRESSION 

By Ian W. McKeague 1 and Bodhisattva Sen 2 

Columbia University 

This paper develops a point impact linear regression model in 
which the trajectory of a continuous stochastic process, when evalu- 
ated at a sensitive time point, is associated with a scalar response. 
The proposed model complements and is more interpretable than the 
functional linear regression approach that has become popular in re- 
cent years. The trajectories are assumed to have fractal (self-similar) 
properties in common with a fractional Brownian motion with an un- 
known Hurst exponent. Bootstrap confidence intervals based on the 
least-squares estimator of the sensitive time point are developed. Mis- 
specification of the point impact model by a functional linear model 
is also investigated. Non-Gaussian limit distributions and rates of 
convergence determined by the Hurst exponent play an important 
role. 

1. Introduction. This paper investigates a linear regression model in- 
volving a scalar response Y and a predictor given by the value of the tra- 
jectory of a continuous stochastic process X = {X(t), t £ [0,1]} at some 
unknown time point. Specifically, we consider the point impact linear re- 
gression model 

(1) Y = a + (3X(6) + e 

and focus on the time point 9 E (0, 1) as the target parameter of interest. 
The intercept a and the slope (3 are scalars, and the error e is taken to be 
independent of X, having zero mean and finite variance a 2 . The complete 
trajectory of X is assumed to be observed (at least on a fine enough grid that 
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Fig. 1. Log gene expression at 518 loci along chromosome 17 in tissue from a breast 
cancer patient. 

it makes no difference in terms of accuracy), even though the model itself 
only involves the value of X at 6, which represents a "sensitive" time point 
in terms of the relationship to the response. The main aim of the paper is 
to show that the precision of estimation of 6 is driven by fractal behavior in 
X, and to develop valid inferential procedures that adapt to a broad range 
of such behavior. Our model could easily be extended in various ways, for 
example, to allow multiple sensitive time points or further covariates, but, 
for simplicity, we restrict attention to (1). 

Our motivation for developing this type of model arises from genome- wide 
expression studies that measure the activity of numerous genes simultane- 
ously. In these studies, it is of interest to locate genes showing activity that 
is associated with clinical outcomes. Emilsson et al. [10], for example, stud- 
ied gene expression levels at over 24,000 loci in samples of adipose tissue 
to identify genes correlated with body mass index and other obesity-related 
outcomes. Gruvberger-Saal et al. [13] used gene expression profiles from 
the tumors of breast cancer patients to predict estrogen receptor protein 
concentration, an important prognostic marker for breast tumors; see also 
[5]. In such studies, the gene expression profile across a chromosome can 
be regarded a functional predictor, and a gene associated with the clinical 
outcome is identified by its base pair position 9 along the chromosome; see 
Figure 1. Our aim here is to develop a method of estimating a confidence 
interval for 6, leading to the identification of chromosomal regions that are 
potentially useful for diagnosis and therapy. Although there is extensive sta- 
tistical literature on gene expression data, it is almost exclusively concerned 
with multiple testing procedures for detecting differentially expressed genes; 
see, for example, [8, 30]. 

Gene expression profiles (as in Figure 1) clearly display fractal behavior, 
that is, self-similarity over a range of scales. Indeed, fractals often arise when 
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spatiotemporal patterns at higher levels emerge from localized interactions 
and selection processes acting at lower levels, as with gene expression ac- 
tivity. Moreover, the recent discovery [19] that chromosomes are folded as 
"fractal globules," which can easily unfold during gene activation, also helps 
explain the fractal appearance of gene expression profiles. 

A basic stochastic model for fractal phenomena is provided by fractional 
Brownian motion (fBm) (see [22]), in which the so-called Hurst exponent 
H E [0, 1] calibrates the scaling of the self-similarity and provides a natural 
measure of trajectory roughness. It featured prominently in the pioneering 
work of Benoit Mandelbrot, who stated ([23], page 256) that fBm provides 
"the most manageable mathematical environment I can think of (for repre- 
senting fractals)." For background on fBm from a statistical modeling point 
of view, see [11]. 

The key issue to be considered in this paper is how to construct a confi- 
dence interval for the true sensitive time point 9q based on its least squares 
estimator 6 n , obtained by fitting model (1) from a sample of size n, 

n 

(2) (on AA) = argmin V[y, -a- pXi{9)] 2 . 

We show that, when X is fBm, both the rate of convergence r n and limiting 
distribution of 6 n depend on H . In addition, we construct bootstrap confi- 
dence intervals for 6q that do not require knowledge of H. This facilitates 
applications (e.g., to gene expression data) in which the type of fractal be- 
havior is not known in advance; the trajectory in Figure 1 has an estimated 
Hurst exponent of about 0.1, but it would be very difficult to estimate pre- 
cisely using data in a small neighborhood of n ^ so a bootstrap approach 
becomes crucial. We emphasize that nothing about the distribution of X is 
used in the construction of the estimators or the bootstrap confidence in- 
tervals; the fBm assumption will only be utilized to study the large sample 
properties of these procedures. Moreover, our main results will make essen- 
tial use of the fBm assumption only locally, that is, in a small neighborhood 

of e . 

The point impact model (1) can be regarded as a simple working model 
that provides interpretable information about the influence of X at a specific 
location (e.g., a genetic locus). Such information cannot be extracted using 
the standard functional linear regression model [27] given by 

(3) Y = a+ [ f(t)X{t)dt + e, 

Jo 

where / is a continuous function and a is an intercept, because the influ- 
ence of X(t) is spread continuously across [0, 1] and point-impact effects are 
excluded. In the gene expression context, if only a few genes are predictive 
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of Y, then a model of the form (1) would be more suitable than (3), which 
does not allow / to have infinite spikes. In general, however, a continuum of 
locations is likely to be involved (as well as point-impacts), so it is of interest 
to study the behavior of 9 n in misspecified settings in which the data arise 
from combinations of (1) and (3). 

Asymptotic results for the least squares estimator (2) in the correctly 
specified setting are presented in Section 2. In Section 3 it is shown that 
the residual bootstrap is consistent for the distribution of 9 n , leading to 
the construction of valid bootstrap confidence intervals without knowing H. 
The nonparametric bootstrap is shown to be inconsistent in the same set- 
ting. The effect of misspecification is discussed in Section 4. A two-sample 
problem version of the point impact model is discussed in Section 5. Some 
numerical examples are presented in Section 6, where we compare the pro- 
posed bootstrap confidence interval with Wald-type confidence intervals (in 
which H is assumed to be known); an application to gene expression data is 
also discussed. Concluding remarks appear in Section 7. Proofs are placed 
in Section 8. 

2. Least squares estimation of the sensitive time point. Throughout we 
take X to be a fBm with Hurst exponent H, which, as discussed earlier, 
controls the roughness of the trajectories. We shall see in this section that 
the rate of convergence of 9 n can be expressed explicitly in terms of H . 

First we recall some basic properties of fBm. A (standard) fBm with 
Hurst exponent H G (0, 1] is a Gaussian process Bh = {!?#(£), t G M} having 
continuous sample paths, mean zero and covariance function 

(4) Cov{B H (t),B H (s)} = ±(\t\ 2H + \s\ 2H -\t- s\ 2H ). 

By comparing their mean and covariance functions, Bu(at) = a H ' Bu{t) as 
processes, for all a > (self-similarity). Clearly, Bi/ 2 is a two-sided Brownian 
motion, and B\ is a random straight line: B\(t) = tZ where Z ~ iV(0, 1). The 
increments are negatively correlated if H < 1/2, and positively correlated if 
H > 1/2. Increasing H results in smoother sample paths. 

Suppose (Xj, Yi),i = 1, . . . ,n, are i.i.d. copies of (X, Y) satisfying the 
model (1). The unknown parameter is r/ = (a, p, 6) G S = M? x [0, 1] , and its 
true value is denoted r/Q = (ao>A)>#o)- The following conditions are needed: 

(Al) X is a fBm with Hurst exponent H G (0, 1). 
(A2) 0<<9 <1 and /3 7^0. 
(A3) E\e\ 2+S < oo for some 5 > 0. 

The construction of the least squares estimator fj n = (& n ,$ n ,9 n ), defined 
by (2), does not involve any assumptions about the distribution of the tra- 
jectories, whereas the asymptotic behavior does. Our first result gives the 
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consistency and asymptotic distribution of f\ n under the above assumptions. 

Theorem 2.1. If (Al) and (A2) hold, then fj n is consistent, that is, 
Vn^Vo- If (A3) also holds, then 

C n = (V^(a n -a ),V^(Pn-Po),n 1/(2H) (d n -e )) 

(5) 

A (aZ 1 ,\6 \~ H aZ 2 , a rgmm\2-^-B H (t) + \t\ 2H )) =(, 
\ tm I IpoI J / 

where Z\ and Z 2 are i.i.d. N(0, 1), independent of the fBm Bh . 
Remarks. 

1. It may come as a surprise that the convergence rate of 8 n increases as 
H decreases, and becomes arbitrarily fast as H — > 0. A heuristic expla- 
nation is that fBm "travels further" with a smaller H, so independent 
trajectories of X are likely to "cover different ground," making it easier 
to estimate 6q. In a nutshell, the smaller the Hurst exponent, the better 
the design. 

2. It follows from (a sight extension of) Lemmas 2.5 and 2.6 of Kim and 
Pollard [15] that the third component of £ is well defined. 

3. Using the self-similarity of fBm, the asymptotic distribution of 9 n can be 
expressed as the distribution of 

/ \ i/H 

(6) A =(nM ^gmm(B H (t) + \t\ 2H /2). 

This distribution does not appear to have been studied in the literature 
except for H =1/2 and H = 1 (standard normal). When H = 1/2, X is a 
standard Brownian motion and the limiting distribution is given in terms 
of a two-sided Brownian motion with a triangular drift. Bhattacharya 
and Brockwell [2] showed that this distribution has a density that can 
be expressed in terms of the standard normal distribution function. It 
arises frequently in change-point problems under contiguous asymptotics 
[24, 34, 37]. 

4. From the proof, it can be seen that the essential assumptions on X are 
the self-similarity and stationary increments properties in some neighbor- 
hood of #o, along with the trajectories of X being Lipschitz of all orders 
less than H. Note that any Gaussian self-similar process with stationary 
increments and zero mean is a fBm (see, e.g., Theorem 1.3.3 of [9]). 

5. The trajectories of fBm are nondifferentiable when H < 1, so the usual 
technique of Taylor expanding the criterion function about #o does not 
work and a more sophisticated approach is required to prove the result. 
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6. Note that (6t n ,f3 n ) has the same limiting behavior as though 6q is known, 
and 9 n and (a n ,f3 n ) are asymptotically independent. 

7. The result is readily extended to allow for additional covariates [cf. (11)], 
which is often important in applications. The limiting distribution of 9 n 
remains the same, and the other regression coefficient estimates have the 
same limiting behavior as though 6q is known. 

8. Note that the assumption /3o 7^ is crucial for the theorem to hold. When 
/3o = 0, the fBm does not influence the response at all and 6 n contains no 
information about 9q. 

3. Bootstrap confidence intervals. In general, a valid Wald-type confi- 
dence interval for #0 would at least need a consistent estimator of the Hurst 
exponent H, which is a nuisance parameter in this problem. Unfortunately, 
however, accurate estimation of H is difficult and quite often unstable. Boot- 
strap methods have been widely applied to avoid issues of nuisance parame- 
ter estimation, and they work well in problems with -y/n-rates; see [3, 32, 33] 
and the references therein. In this section we study the consistency prop- 
erties of two bootstrap methods that arise naturally in our setting. One of 
these methods leads to a valid confidence interval for #0 without requiring 
any knowledge of H. 

3.1. Preliminaries. We start with a brief review of the bootstrap. Given 
a sample Z n = {Z\,Z2, ■ ■ ■ , Z n } 1A ^' L from an unknown distribution L, sup- 
pose that the distribution function, F n , say, of some random variable R n = 
R n (Zi n , L), is of interest; R n is usually called a root and it can in general be 
any measurable function of the data and the distribution L. The bootstrap 
method can be broken into three simple steps: 

(i) Construct an estimator L n of L from Z n . 

(ii) Generate Z* = {Zf , . . . , Z*} ~ L n given Z„. 

(iii) Estimate F n by F*, the conditional c.d.f. of R n (Z* n , L n ) given Z n . 

Let d denote the Levy metric or any other metric metrizing weak convergence 

of distribution functions. We say that F* is weakly consistent if d(F n ,F*) — > 
0; if F n has a weak limit F, this is equivalent to F* converging weakly to F 
in probability. 

The choice of L n mostly considered in the literature is the empirical distri- 
bution. Intuitively, an L n that mimics the essential properties (e.g., smooth- 
ness) of the underlying distribution L can be expected to perform well. In 
most situations, the empirical distribution of the data is a good estimator 
of L, but in some nonstandard situations it may fail to capture some of the 
important aspects of the problem, and the corresponding bootstrap method 
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can be suspect. The following discussion illustrates this phenomenon (the in- 
consistency when bootstrapping from the empirical distribution of the data) 
when A n = n l ^ 2H \6 n — Oq) is the random variable of interest. 

3.2. Inconsistency of bootstrapping pairs. In a regression setup there are 
two natural ways of bootstrapping: bootstrapping pairs (i.e., the nonpara- 
metric bootstrap) and bootstrapping residuals (while keeping the predictors 
fixed). We show that bootstrapping pairs (drawing n data points with re- 
placement from the original data set) is inconsistent for 6q. 

Theorem 3.1. Under conditions (Al)-(A3), the nonparametric boot- 
strap is inconsistent for estimating the distribution of A n , that is, A* = 
n !/( 2 -ff)^* _ conditional on the data, does not converge in distribution 
to A in probability, where A is defined by (6). 

3.3. Consistency of bootstrapping residuals. Another bootstrap proce- 
dure is to use the form of the assumed model more explicitly to draw the 
bootstrap samples: condition on the predictor Xi and generate its response 
as 

(7) Y* = a n + p n X % (6 n )+e*, 

where the e* are conditionally i.i.d. under the empirical distribution of the 
centered residuals ii — e n , with fj = Yi — a n — [3 n Xi(6 n ) and e n = Y17=i ^i/ n - 
Let <5*,/3* and #* be the estimates of the unknown parameters obtained 
from the bootstrap sample. We approximate the distribution of C, n [see (5)] 
by the conditional distribution of 

C = [v^« - «n), V^0* n - /Un 1/(2 ^(#; - e n )l 

given the data. 

Theorem 3.2. Under conditions (A1)-(A3), the above procedure of boot- 
strapping residuals is consistent for estimating the distribution of C, n , that 

is, Cn —> C> i- n probability, conditional on the data. 

We now use the above theorem to construct a valid confidence interval 
(CI) for #o that does not require any knowledge of H . Let be the a- 
quantile of the conditional distribution of (#* — 6 n ) given the data, which 
can be readily obtained via simulation and does not involve the knowledge of 
any distributional properties of X. The proposed approximate (1 — 2a)-level 
bootstrap CI for #o is then given by 

C n = [On - Ql-a^n ~ 
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Under the assumptions of Theorem 3.2, the coverage probability of this CI 
is 

P{9 G C n } = P{n^ 2H k* a < A n < n VM^} 

= p*te<0;-0 n <?u} 

= 1 - 2a, 

where P* denotes the bootstrap distribution given the data, and we have 
used the fact that the supremum distance between the relevant distribution 
functions of A n and A* is asymptotically negligible. The key point of this 
argument is that A n and A* have the same normalization factor n 1 ^ 2 ^ 
and, thus, it "cancels" out. CIs for ao and (3q can be constructed in a similar 
fashion. 

3.4. Discussion. In nonparametric regression settings, dichotomies in 
the behavior of different bootstrap methods are well known, for example, 
when using the bootstrap to calibrate omnibus goodness-of-fit tests for para- 
metric regression models; see [14, 25, 36] and references therein. A dichotomy 
in the behavior of the two bootstrap methods, however, is surprising in a 
linear regression model. This illustrates that in problems with nonstandard 
asymptotics, the usual nonparametric bootstrap might fail, whereas a re- 
sampling procedure that uses some particular structure of the model can 
perform well. The improved performance of bootstrapping residuals will be 
confirmed by our simulation results in Section 6. 

The difference in the behavior of the two bootstrap methods can be ex- 
plained as follows. As in any M-estimation problem, the standard approach 
is to study the criterion (objective) function being optimized, in a neigh- 
borhood of the target parameter, by splitting it into an empirical process 
and a drift term. The drift term has different behavior for the two boot- 
strap methods: while bootstrapping pairs, it does not converge, whereas 
the bootstrapped residuals are conditionally independent of the predictors, 
and the drift term converges. This highlights the importance of designing 
the bootstrap to accurately replicate the structure in the assumed model. 
A more technical explanation is provided in a remark following the proof of 
Theorem 3.2. 

Other types of resampling (e.g., the m-out-of-re bootstrap, or subsam- 
pling) could be applicable, but such methods require knowledge of the rate 
of convergence, which depends on the unknown H. Also, these methods re- 
quire the choice of a tuning parameter, which is problematic in practice. 
However, the residual bootstrap is consistent, easy to implement, and does 
not require the knowledge of H and the estimation of a tuning parameter. 
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The inconsistency of the nonparametric bootstrap casts some doubt on 
its validity for checking the stability of variable selection results in high- 
dimensional regression problems (as is common practice). Indeed, it suggests 
that more care (in terms of more explicit use of the model) is needed in the 
choice of a bootstrap method in such settings. 



4. Misspecification by a functional linear model. The point impact model 
cannot capture effects that are spread out over the domain of the trajectory, 
for example, gene expression profiles for which the effect on a clinical out- 
come involves complex interactions between numerous genes. Such effects, 
however, may be represented by a functional linear model, and we now ex- 
amine how the limiting behavior of 6 n changes when the data arise in this 
way. 



4.1. Complete misspecification. In this case we treat (1) as the working 
model (for fitting the data), but view this model as being completely mis- 
specified in the sense that the data are generated from the functional linear 
model (3). For simplicity, we set a = and (3 = 1 in the working model, and 
set a = in the true functional linear model. The least squares estimator 6 n 
now estimates the minimizer 6q of 



M(0) = E[Y - X(9)} 2 = a 2 + E 



n 2 



f(t)X(t)dt-X(6) 



and the following result gives its asymptotic distribution. 



Theorem 4.1. Suppose that (Al) and (A3) hold, and that M(9) has 
a unique minimizer and is twice- differentiable at < 9q < 1. Then, in the 
misspecified case, 

n i/(4-2H)0 n _ 4 argmin (2a J B f/ (t) + bt 2 ), 
where a 2 = M(0 O ) and b = M"(0 O ) A 



Remarks. 

1. Here the rate of convergence reverses itself from the correctly specified 
case: the convergence rate now decreases as H decreases, going from a 
parametric rate of n 1//2 when H — > 1, to as slow as n 1 / 4 as H — > 0. A 
heuristic explanation is that roughness in X now amounts to measure- 
ment error (which results in a slower rate) as the fluctuations of X are 
smoothed out in the true model. 
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2. In the case of Brownian motion trajectories (H = 1/2), note that M(0) = 
— 2 J Q f(t) min(i, 0) dt + const, the normal equation is 

(8) M'(0) = l-2 / f(t)dt = 

Je 

and M"(0) = 2/(0). 

3. Also in the case H = 1/2, the limiting distribution is given in terms of 
two-sided Brownian motion with a parabolic drift, and was investigated 
originally by Chernoff [6] in connection with the estimation of the mode 
of a distribution, and shown to have a density (as the solution of a heat 
equation) . The Chernoff distribution arises frequently in monotone func- 
tion estimation settings; Groeneboom and Wellner [12] introduced various 
algorithms for computation of its distribution function and quantiles. 

4.2. Partial misspecification. The nonparametric functional linear model 
(3) can be combined with (1) to give the semiparametric model 

(9) Y = a + (3X(9) + [ f(t)X(t)dt + e, 

Jo 

which allows X to have both a point impact and an influence that is spread 
out continuously in time. When / = 0, this model reduces to the point impact 
model; when (3 = 0, to the functional linear model. In this section we examine 
the behavior of 9 n when the working model is (1), as before, but the data 
are now generated from (9). 

For simplicity, suppose that a = and /3 = 1 in both the working point 
impact model and in the true model (9). Denote the true value of by 
00 G (0, 1). It can then be shown that n is robust to small levels of misspec- 
ification, that is, it consistently estimates 9q with the same rate of conver- 
gence as in the correctly specified case. Indeed, n targets the minimizer of 
the criterion function 

M(9) = E[Y-X(9)] 2 = \9-9 \ 2H - [ f{t) [t 2H + 9 2H -\9-t\ 2H ] dt + const. 

Jo 

Provided J |/| is sufficiently small, the derivative of M will be negative over 
the interval (O,0o) an d positive over (0o,l), so M is minimized at 9q. It is 
then possible to extend Theorem 2.1 to give 

(10) n^ 2H \9 n - O ) A a 1 '" axgmm(B H (t) + \t\ 2H /2), 

where a > a is defined in the statement of Theorem 4.1. This shows that 
the effect of partial misspecification is a simple inflation of the variance [cf. 
(6)], without any change in the form of the limit distribution. 
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It is also of interest to estimate 9q in a way that adapts to any function / 
(i.e., sufficiently smooth) in this semiparametric setting. This could be done, 
for example, by approximating / by a finite B-spline basis expansion of the 
form f m (t) = X^j=i Pj4>j{t)-, an d fitting the working model 

m 

(11) Y = a + l3X{e) + Y J Pi z i + ^ 

where Zj = J <f)j(t)X(t) dt are additional covariates with regression coef- 
ficients /3j] the resulting least squares estimator 9 n can then be used as 
an estimate of 9q of 9. For the working model (11), the misspecification is 
f — f m , which will be small if m is sufficiently large. Therefore, based on our 
previous discussion, 9 n will satisfy a result of the form (10); in particular, 9 n 
will exhibit the fast n 1 ^ 2 ^- l -rate of convergence. Note that for this result to 
hold, m can be fixed and does not need to tend to infinity with the sample 
size. 

5. Two-sample problem. In this section we discuss a variation of the 
point impact regression model in which the response takes just two values 
(say ±1). This is of interest, for example, in case-control studies in which 
gene-expression data are available for a sample of cancer patients and a 
sample of healthy controls, and the target parameter is the locus of a differ- 
entially expressed gene. 

Suppose we have two independent samples of trajectories X, with n\ 
trajectories from class 1, and n 2 trajectories from class —1, for a total sample 
size of n = n\ + n 2 - It is assumed that p = n\jn 2 > remains fixed, and the 
jth sample satisfies the model 

X ij (t)=n j (t)+e ij (t), i = 1,2, 

where £«, i = 1, . . . ,rij are i.i.d. fBms with Hurst exponent H S (0, 1), and 
Hj(t) is an unknown mean function, assumed to be continuous. The treat- 
ment effect M(i) = p\{t) — is taken to have a point impact in the sense 
of having a unique maximum at 9q G (0, 1); minima can of course be treated 
in a similar fashion. The least squares estimator of the sensitive time point 
now becomes 

(12) 9 n = SX gm^{X 1 {9)-X 2 {9)}, 

6 

where X, (0) = Ei=i*i#)/S is 

the sample mean for class j. Although a 
studentized version (normalizing the the difference of the sample means by 
a standard error estimate) might be preferable in some cases, with small or 
unbalanced samples, say, to keep the discussion simple, we restrict attention 
to 9 n . The empirical criterion function M n (#) = X\(9) — X2{9) converges 
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uniformly to M(0) a.s. (by the Glivenko-Cantelli theorem), so 6 n is a con- 
sistent estimator of 9q. 

As before, our objective is to find a confidence interval for 9q based on 9 n 
under appropriate conditions on the treatment effect. Toward this end, we 
need an assumption on the degree of smoothness of the treatment effect at 
#o in terms of an exponent < S < 1: 

m(0) = m(0 o ) - c\e - e \ 2S + o(\e - e \ 2S ) 

as 9 — > 9q, where c > 0. If M is twice-differentiable at 9q, then this assumption 
holds only with S = 1; for it to hold for some S < 1, a cusp is needed. When 
the smoothness of the treatment effect and the fBm match, that is, S = H, 
the rate of convergence of 9 n is n ly/ ( 2H ^ , as before, and 9 n has a nondegenerate 
limit distribution of the same form as in Theorem 2.1: 

(13) n\ /(2H) {9 n - O ) A argmin{(l + y/p)B H {t) + c\t\ 2H }. 

t&M. 

The key step in the proof (which is simpler than in the regression case) is 
given at the end of Section 8. 

6. Numerical examples. In this section we report some numerical results 
based on trajectories from fBm simulations and from gene expression data. 

We first consider a correctly specified example as in Section 2 and study 
the behavior of CIs for the sensitive time-point 9q using the two bootstrap 
based methods, and compare them with the 100(1 — a)% Wald-type CI 

/ - \ 1 / H 

(14) n ±(-^- 

\\PnWnJ 

with H assumed to be known. Here a n is the sample standard deviation 
of the residuals, and zh )0 is the upper a-quantile of argmin te K(i?_H'(t) + 
\t\ 2H /2). In practice, H needs to be estimated to apply (14). Numerous 
estimators of H based on a single realization of X have been proposed in 
the literature [1, 7], although observation at fine time scales is required for 
such estimators to work well, and it is not clear that direct plug-in would 
be satisfactory. The quantiles Zjj >a /2 needed to compute the Wald-type CIs 
were extracted from an extensive simulation of the limit distribution, as no 
closed form expression is available. 

Table 1 reports the estimated coverage probabilities and average lengths 
of nominal 95% confidence intervals for 9q calculated using 500 independent 
samples. The data were generated from the model (1), for ao = 0, (3q = 1, 
0o = 1/2, £ ~ iV(0,<7 2 ) where a = 0.3 and 0.5, the Hurst exponent H = 
0.3,0.5,0.7 and sample sizes n = 20 and 40. To calculate the least squares 
estimators (2), we restricted 9 to a uniform grid of 101 points in [0,1]; the 
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Table 1 

Monte Carlo results for coverage probabilities and average widths of nominal 95% 
confidence intervals for Oq; data simulated from the linear model with 8o = 1/2, oco = 

and pa = 1 



Wald-if R bootstrap NP bootstrap 



n 


a 


H 


Cover 


Width 


Cover 


Width 


Cover 


Width 


20 


0.3 


0.3 


0.874 


0.023 


0.924 


0.044 


1.000 


0.174 






0.5 


0.880 


0.088 


0.946 


0.119 


0.992 


0.220 






0.7 


0.822 


0.170 


0.912 


0.249 


0.970 


0.360 




0.5 


0.3 


0.806 


0.129 


0.912 


0.211 


0.998 


0.410 






0.5 


0.852 


0.256 


0.924 


0.333 


0.988 


0.487 






0.7 


0.834 


0.352 


0.938 


0.510 


0.962 


0.591 


40 


0.3 


0.3 


0.984 


0.007 


0.986 


0.002 


1.000 


0.022 






0.5 


0.892 


0.048 


0.942 


0.053 


0.992 


0.087 






0.7 


0.898 


0.108 


0.930 


0.138 


0.976 


0.182 




0.5 


0.3 


0.900 


0.039 


0.928 


0.054 


0.998 


0.149 






0.5 


0.908 


0.134 


0.950 


0.165 


0.990 


0.251 






0.7 


0.856 


0.229 


0.946 


0.332 


0.962 


0.386 



fBm trajectories were generated over the same grid. The fBm simulations 
were carried out in R, using the function fbmSim from the f Arma package, 
and via the Cholesky method of decomposing the covariance matrix of X . 
Histograms and scatterplots of 9 n and f3 n for H = 0.3,0.5,0.7 when a = 0.5 
are displayed in Figure 2. 

In practice, X can only be observed at discrete time points, so restricting 
to a grid is the natural formulation for this example. Indeed, the resolution 
of the observation times in the neighborhood of 6q is a limiting factor for 
the accuracy of 6 n , so the grid resolution needs to be fine enough for the 
statistical behavior of 9 n to be apparent. For large sample sizes, a very fine 
grid would be needed in the case of a small Hurst exponent (cf. Theorem 
2.1). Indeed, the histogram of 9 n in the case H = 0.3 (the first plot in Figure 
2) shows that the resolution of the grid is almost too coarse to see the 
statistical variation, as the bin centered on 8q = 1/2 contains almost 80% of 
the estimates. This phenomenon is also observed in Table 1 when n = 40 and 
a = H = 0.3. The average length of the CIs is smaller than the resolution of 
the grid and, thus, we observe an over-coverage. The two histograms of n 
for H = 0.5 and H = 0.7, however, show increasing dispersion and become 
closer to bell-shaped as H increases. 

Recall that, for simplicity, we pretend as if we know H, which should be an 
advantage, yet the residual bootstrap has better performance based on the 
results in Table 1. We see that usually the Wald-type CIs have coverage less 
than the nominal 95%, whereas the inconsistent nonparametric bootstrap 
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0.4 0.6 
thetahat 



0.4 0.6 
thetahat 



0.4 0.6 
thetahat 



1L 



0.0 0.5 1.0 1.5 2.0 2.5 3.0 
betahat 



0.0 0.5 1.0 1.5 2.0 2.5 
betahat 



0.0 0.5 1.0 1.5 2.0 2.5 3.0 
betahat 




Fig. 2. Histograms and scatterplots of 9 n and f3 n in the correctly specified case for 
H — 0.3 (top row), H = 0.5 (middle row) and H — 0.7 (bottom row), based on 500 samples 
of size n — 20. 



method over-covers with observed coverage probability close to 1. Accord- 
ingly, the average lengths of the Wald-type CIs are the smallest, whereas 
those obtained from the nonparametric bootstrap method are the widest. 
The behavior of CIs obtained from the nonparametric bootstrap method 
also illustrates the inconsistency of this procedure. A similar phenomenon 
was also observed in [20] in connection with estimators that converge at 

n l/3 

-rate. 

Despite the asymptotic independence of 9 n and (3 n , considerable corre- 
lation is apparent in the scatterplots in Figure 2, with increasing negative 
correlation as H increases; note, however, that when H = 1 there is a lack 
of identifiability of 9 and /3, so the trend in the correlation as H approaches 
1 is to be expected in small samples. 
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Tt-i-^- 



0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 

thetahat betahat 



0.0 0.2 0.4 0.6 0.S 
thetahat 



0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 

thetahat betahat 



0.0 0.2 0.4 0.6 Oi 
thetahat 



.EHffl 



0.2 0.4 0.6 0.8 1.0 0.0 0.5 1.0 1.5 2.0 2.5 3.0 

thetahat betahat 



0.0 0.2 0.4 0.6 0.£ 
thetahat 



Fig. 3. Same as Figure 2 except in the partially misspecified case. 



Next we consider a partially misspecified example, in which the data are 
now generated from (9) by setting f(t) = 1/2 and 9 = = 1/2, but the other 
ingredients are unchanged from the correctly specified example. The plots 
in Figure 2 correspond to those in Figure 3. The effect of misspecification 
on 9 n is a slight increase in dispersion but no change in mean; the effect on 
P n is a substantial shift in mean along with a slight increase in dispersion. 

6.1. Gene expression example. Next we consider the gene expression 
data mentioned in connection with Figure 1, to see how the residual boot- 
strap performs with such trajectories. The trajectories consist of log gene 
expression levels from the breast tissue of n = 40 breast cancer patients, 
along a sequence of 518 loci from chromosome 17. The complete gene ex- 
pression data set is described in Richardson et al. [29]. Although a contin- 
uous response is not available for this particular data set, it is available in 
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';..n. Rmi 
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Fig. 4. Gene expression example: histograms of 6„ based on 1000 residual bootstrap sam- 
ples and simulated responses with a = 0.01 (left), a = 0.03 (middle) and a = 0.1 (right). 



numerous other studies of this type; see the references mentioned in the 
Introduction. 

To construct a scalar response, we generated Yi using the point impact 
model (1) with ao = and /3o = 1, 0o = 0.5 (corresponding to the position 
of 259 base pairs along the chromosome) and e ~ iV(0, a 2 ) for various values 
of g. As previously noted, the trajectories are very rough in this example 
(with H estimated to be about 0.1), which implies a rapid rate of con- 
vergence for 9 n . We find that an abrupt transition in the behavior of the 
residual bootstrap occurs as a increases: for small er, the residual bootstrap 
estimates become degenerate at 6q due to the relatively coarse resolution; 
for moderately large <r, although a considerable portion of the estimates are 
concentrated at 0q, they become spread out over the 518 loci; for very large 
<t, the estimates are more or less uniformly scattered along the chromosome. 
Indeed, this is consistent with the behavior of the Wald-type CI (14) hav- 
ing width proportional to a l l H , which blows up dramatically as a increases 
when H is small. 

In Figure 4 we plot the bootstrap distribution of 9 n (obtained from 1000 
residual bootstrap samples in each case) for a = 0.01, 0.03 and 0.1. When 
a = 0.01, the bootstrap distribution is degenerate at 6>o; the resolution of the 
grid is too course to see any statistical fluctuation in this case. When a is 
moderate, namely, 0.03, although the bootstrap distribution has a peak at 
#o> the mass is widely scattered and the resulting CI now covers almost the 
entire chromosome. Further increasing the noise level causes the bootstrap 
distribution to become even more dispersed and its mode moves away from 
#o; the sample size of 40 is now too small for the method to locate the 
neighborhood of #o- 

7. Concluding remarks. In this paper we have developed a point impact 
functional linear regression model for use with trajectories as predictors of 
a continuous scalar response. It is expected that the proposed approach will 
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be useful when there are sensitive time points at which the trajectory has 
an effect on the response. We have derived the rates of convergence and 
the explicit limiting distributions of the least squares estimator of such a 
parameter in terms of the Hurst exponent for fBm trajectories. We also 
established the validity of the residual bootstrap method for obtaining CIs 
for sensitive time points, avoiding the need to estimate the Hurst exponent. 
In addition, we have developed some results in the misspecified case in which 
the data are generated partially or completely from a standard functional 
linear model, and in the two-sample setting. 

Although for simplicity of presentation we have assumed that the trajec- 
tories are fBm, it is clear from the proofs that it is only local properties of 
the trajectories in the neighborhood of the sensitive time point that drive 
the theory, and thus the validity of the confidence intervals. The consistency 
of the least squares estimator is of course needed, but this could be estab- 
lished under much weaker assumptions (namely, uniform convergence of the 
empirical criterion function and the existence of a well-separated minimum 
of the limiting criterion function; cf. [35], page 287). 

Exploiting the fractal behavior of the trajectories plays a crucial role 
in developing confidence intervals based on the least squares estimator of 
the sensitive time point, in contrast to standard functional linear regression 
where it is customary to smooth the predictor trajectories prior to fitting 
the model ([27], Chapter 15). Our approach does not require any prepro- 
cessing of the trajectories involving a choice of smoothing parameters, nor 
any estimation of nuisance parameters (namely, the Hurst exponent). On 
the other hand, functional linear regression is designed with prediction in 
mind, rather than interpretability, so in a sense the two approaches are com- 
plimentary. The tendency of functional linear regression to over-smooth a 
point impact (see [21] for detailed discussion) is also due to the use of a 
roughness penalty on the regression function; the smoothing parameter is 
usually chosen by cross-validation, a criterion that optimizes for predictive 
performance. 

Our model naturally extends to allow multiple sensitive time points, and 
any model selection procedure having the oracle property (such as the adap- 
tive lasso) could be used to estimate the number of those sensitive time 
points. The bootstrap procedure for the (unregularized) least squares es- 
timator can then be adapted to provide individual CIs around each time 
point, although developing theoretical justification would be challenging. 
Other challenging problems would be to develop bootstrap procedures that 
are suitable for the two-sample problem and for the misspecified model set- 
tings. 

It should be feasible to carry through much of our program for certain 
types of diffusion processes driven by fBm, and also for processes having 
jumps. In the case of piecewise constant trajectories that have a single 
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jump, the theory specializes to an existing type of change-point analysis 
[18]. Other possibilities include Levy processes (which have stationary in- 
dependent increments) and multi-parameter fBm. It should also be possible 
to develop versions of our results in the setting of censored survival data 
(e.g., Cox regression). Lindquist and McKeague [21] recently studied point 
impact generalized linear regression models in the case that X is standard 
Brownian motion and we expect that our approach can be extended to such 
models as well. 

8. Proofs. To avoid measurability problems and for simplicity of nota- 
tion, we will always use outer expectation/probability, and denote them by E 
and P; E* and P* will denote bootstrap conditional expectation/probability 
given the data. 

We begin with the proof of Theorem 2.1. The strategy is to establish 
(a) consistency, (b) the rate of convergence, (c) the weak convergence of a 
suitably localized version of the criterion function, and (d) apply the argmax 
(or argmin) continuous mapping theorem. 

8.1. Consistency. We start with some notation. Let m„(Y,X) = \Y — 
a - f3X(9)] 2 and let M n (r?) = P n m„ = ± £S=i[*i -a- /3Xi(9)] 2 , where P n 
denotes the expectation with respect to the empirical measure of the data. 
Let 

M(rj) = Pm v = (ao - a) 2 + P[{f3 X(9 ) - pX(9)} 2 ) + a 2 

(15) = (a - a) 2 + a 2 + (ft - (3) 2 P[X 2 (9 )} + (3 2 P[X(9 ) - X(8)} 2 

+ 2/3(ft - (3)P[X(9 ){X(9 ) - X(6)}]. 

First observe that M(r/) has a unique minimizer at rjo as P[f3X(9) ^ ftX(#o)] > 
0, for all (/3,0)€Rx (0,1) with ((3,9) ^ (f3 ,9 ). 
Using the fBm covariance formula (4), 

M(r?) - MM = (a - a) 2 + (ft - /3)>o| 2H + /3 2 |#o " 0\ 2H 

+ /3(ft - p){\Oo\ 2H + \e Q - e\ 2H - \9\ 2H } 

(16) 

= (a - a) 2 + (ft - P?%\ 2H + /3ft|#o - 0\ 2H 
+ (3{p Q -(3){\9 Q \ 2H -\9\ 2H }. 

To show that fj n is a consistent estimator of rjQ, first note that i) n is 
uniformly tight. Also notice that M(r/) is continuous and has a unique mini- 
mum at r/o, and, thus, by Theorem 3.2. 3(i) of [35], it is enough to show that 

M n -5- M uniformly on each compact subset K of S = M 2 x [0, 1] , and that 
M has a well-separated minimum in the sense that M(r/o) < inf M(ry) for 
every open set G that contains 770. That M has a well-separated minimum 
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can be seen from the form of the expression in (16). For the uniform conver- 
gence, we need to show that the class T = {m v : r] E K} is P-Glivenko Can- 
telli (P-GC). Using GC preservation properties (see Corollary 9.27 of [17]), 
it is enough to show that Q = {B H {h) = X(8 + h) - X(0 O ) : h £ [-1,1]} is P- 
GC. Note that almost all trajectories of X are Lipschitz of any order strictly 
less than H, in the sense of (22) in Lemma 8.1 below. Thus, the bracketing 
number N^e.Q^L^Q)) < oo and Q is P-GC, by Theorems 2.7.11 and 2.4.1 
of [35]. 



8.2. Rate of convergence. We will apply a result of van der Vaart and 
Wellner ([35], Theorem 3.2.5) to obtain a lower bound on the rate of conver- 
gence of the M-estimator fj n . Setting d(n,r]o) = max{|a — ao\, \ (3 — (3q\, \9 — 
Oo\ H }, the first step is to show that 

(17) M(7 ? )-M(r /0 )>(i 2 (7 ? ,7 ? o) 

in a neighborhood of rjQ. Here > means that the right-hand side is bounded 
above by a (positive) constant times the left-hand side. Note that \6q\ 2H — 
\6\ 2H has a bounded derivative in 6 £ [S, 1], where 5 > is fixed, so for such 
9 we have 

Pifi-Mm 2H -\o\ 2H ] 

>-\p\\h-P\c\e -o\ 

(18) 

= _ mc \e - e\ l - H ]\p - p\\e - e\ H 
>- c (6,p)m-P) 2 + \o -e\ 2H }, 

where C is the bound on the derivative, c(9,(3) = \{3\C\9q — 9\ l ~ H /2, and we 
used the inequality \ab\ < (a 2 + b 2 )/2. As (3q / and < #o < U by combining 
(16) and (18), suitably grouping terms, and noting that c(9,(3) can be made 
arbitrarily small by restricting 7] to a sufficiently small neighborhood of r/o, 
there exist c\ > and C2 > such that 

M(? ? ) - M(7?o) > (a - a) 2 + ci(/3 - /3) 2 + c 2 \9 - 9\ 2H , 

which shows that (17) holds. 

Let Ai$ = {m v — m w :d(ry,7yo) < 5}, where 5 G (0, 1]. Note that 

m v - m m = (a 2 - ag) + (3 2 [X 2 (9) - X 2 (9 )} + (/3 2 - p 2 )X 2 (9 ) 

- 2Y(a - a ) - 2/3Y[X(0) - X(9 )} - 2(p - p Q )YX{9 Q ) 

(19) 

+ 2aP[X(0) - X(p Q )] + 2aX(9 )(p - /3 ) 
+ 2p X(9 )(a-a ). 
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This shows that Aig has envelope 
M s (Y,X) = (2\a \+5)S + (\(3 \ + 5) 2 sup \X 2 {8) - X 2 (8 )\ 

\9~6 \ H <6 

+ X 2 (8 )5(2\(3 \ + 5) + 2\Y\5 

(20) + 2\Y\(\f3 \+5) sup \X(9)-X(8 )\ 

\e-e \ H <s 

+ 2\X(6 )\\Y\6 + 2(\ ao \ + 5)(\(3 \ + S) sup \X(9)-X(9 ) 

\e-e a \ H <s 

+ 2(\ao\+S)\X(6 )\5 + 2\(3 \\X(e )\5. 
Using a maximal inequality for fBm (Theorem 1.1 of [26]), we have 



(21) E 



sup \X(9)-X(9 )\ C 
\e-e \ H <6 



for any q > 0. Then, using (A3) in conjunction with Holder's inequality (cf. 
the proof of Lemma 8.1), all nine terms in (20) can be shown to have second 
moments bounded by 5 2 (up to a constant) and, thus, EM 2 < 5 2 . 

The following lemma shows that is "Lipschitz in parameter" and, 
consequently, that the bracketing entropy integral Jr.i(l, Ms, L 2 {P)) is uni- 
formly bounded as a function of 8 £ (0, 1]; see [35], page 294. Without loss of 
generality, to simplify notation, we assume that a = and f3 = 1, and state 
the lemma with 9 as the only parameter. 

Lemma 8.1. // (Al) and (A3) hold and < a < H , there is a random 
variable L with finite second moment such that 

(22) \me 1 -m d2 \<L\9 l -9 2 \ a 
for all 9\^2 G [0, 1] almost surely. 

Proof. The trajectories of fBm are Lipschitz of any order a < H in the 
sense that 

(23) \X(t)-X(s)\ <£\t-s\ a Vt,sG[0,l] 

almost surely, where £ has moments of all orders; this is a consequence of 
the proof of Kolmogorov's continuity theorem; see Theorem 2.2 of Revuz 
and Yor [28]. Noting that m e (X,Y) = (Y — X{9)) 2 , we then have 

\m 9l -m d2 \< C\X(9 1 ) - X{9 2 )\ < L\9t - 8 2 \ a , 

where C = 2(sup e \X(9)\ + |Y|) and L = C£. Here L has a finite second 
moment: 

EL 2 < {EC 2p } 1/p {EC 2q } 1/q < oo 
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by Holder's inequality for 1/p + 1/q = 1 with p=l + 5/2 and 5 > comes 
from the moment condition (A3). □ 

Using a maximal inequality from [35] (see page 291), we then have 
E P \\G n \\ Ms < J N (l,A^,L 2 (P))(£Mf )V2 < 6 

for all 5 G (0, 1], where G n = -y/n(IP n — P), and it follows that d{f) n ,r]Q) = 
P (l/y/n) by Theorem 3.2.5 of [35]. 

8.3. Localizing the criterion function. To simplify notation, let r~ 1 h = 
(hx/y/n,h 2 /y/n,n- 1 /^h 3 ), for h = (h u h 2 , h 3 ) G M 3 . Then 

(24) C« = argmin[M n (r7o + r^h) - M n ( % )] 

h 

and we can write the expression in the square brackets after multiplication 
by n as the sum of an empirical process and a drift term: 

(25) G n [V^(m vo+r - lh - m VQ )} + n[M( m + r?h) - M(r ?0 )]. 
First consider the empirical process term, and note that 

= i Y ~ («0 + n- 1 ' 2 ^) - (/3 + n- 1 / 2 h 2 )X(9 + n" 1 ^/^)] 2 



so we obtain 



(26) 

h 2 



/ll + (ft, + -^=V(^ 3 ) + M(# ) 



where B(/i 3 ) = y^P^o + n~ 1/(2H) h 3 ) - ^(#o)] = B H (h 3 ) (as a process in 
h 3 ). 

The result of applying G n to the first term on the right-hand side of the 
above display gives a term of order op(l) uniformly in h G [— K, K] 3 , for 
each K > 0. This is seen by applying the maximal inequality from [35], page 
291, as used above; here the class of functions JF n in question is bounded by 
the envelope function 

{ n V V n J \h 3 \<K n n J 

for which PF 2 = o(l) and Jj.](l, T n ,L 2 (P)) < oo; cf. the proof of Lemma 
8.1. Hence, we just need to consider the second term. To determine the limit 
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distribution of the empirical process term in (25), it thus suffices to show 
that 

(27) G n [(e,eB(h 3 ),eX(e ))) A (aZ 1 ,aB H (h 3 ),aZ 2 ) 

in R x C[—K, K] x R, where Z\,Z 2 are i.i.d. N(0, 1) and independent of the 
fBm Bh- For the second component above, notice that since e is independent 
of B, 

/ n \l/2 

(28) G n [eM(h 3 )]^B H (h 3 )\-J2^j A°B H (h 3 ) 

in C[—K,K]. The asymptotic independence of the three components of (27) 
is a consequence of 

Cov(e, eM(h 3 )) = a 2 E[M(h 3 )} = 0, 

Cov(e, eX(0 o )) = a 2 E[X(9 )} = 0, 



Cov(eM(h 3 ),eX(9 )) = a 2 



V^(|0 O + n ~ l / 2H h 3 \ 2H - \9 \ 2H ) - 

2 VI U-r 6\ 201 



which tends to zero uniformly in /13 6 [—if, If], using the assumption H < 1. 

It just remains to find the limit of the drift term in (25). Using (16), it is 
given by 

h\ + h 2 \9 \ 2H + (A) + n^/ 2 h 2 )fi Q \h 3 \ 2H 

+ h2(Jh + n^ 2 h 2 )[V^{\9 \ 2H - \0 + n^ 2H h 3 \ 2H }} 

^h 2 + h 2 \9 \ 2H + p 2 \h 3 \ 2H 

uniformly in he [—K,K] 3 . Combining this with the limit distribution of 
the first term in (25), we get from (24) and the argmin continuous mapping 
theorem that 



Cn A axgmin[-2<7(Zi/n + p B H {h 3 ) + h 2 \9 \ H Z 



2) 



h 



+ {h 2 + h 2 \9 Q \ 2H + p 2 \h 3 \ 2H )] 

aZ lf \9 \- H aZ 2 , & rgmm{2-^-B H (h 3 ) + \h 3 \ 2H ) 
h 3 I iPol J 

This completes the proof of Theorem 2.1. 

8.4. Proof of Theorem 3.1. We prove the result by the method of con- 
tradiction. Before giving the proof, we state a general lemma that can be 
useful in studying bootstrap validity. The lemma can be proved easily us- 
ing characteristic functions; see also Sethuraman [31] and Theorem 2.2 of 
Kosorok [16]. 



FRACTALS WITH POINT IMPACT 



23 



Lemma 8.2. Let W n and W* be random vectors in M. 1 and M. k , respec- 
tively; let Q and Q* be distributions on the Borel sets ofM, 1 and ]R fc , and let 
J- n be a -fields for which W n is T n -measurable. If W n converges in distri- 
bution to Q and the conditional distribution of W* given J~ n converges (in 
distribution) in probability to Q* , then (W n ,W*) converges in distribution 
to Qx Q*. 

The basic idea of the proof of the theorem now is to assume that A* A* 

in probability, where A* has the same distribution as A. Therefore, A* -4 A* 

unconditionally also. We already know that A n -4 A from Theorem 2.1. By 
Lemma 8.2 applied with W n = A n , W* = A* and F n = a((Y 1 ,Xi),(Y2,X 2 ), 
. . . ,(Y n ,X n )), we can show that (A n ,A*) converges unconditionally to a 

product measure and, thus, A n + A* 4- A + A*. Thus, n l / (2H \6* n - 6 ) = 
A n + A* converges unconditionally to a tight limiting distribution which 
has twice the variance of A. 

Using arguments along the lines of those used in the proof of Theorem 
2.1, we can show that 

n i/(2H)0* _ 0q) 4 argmin{2cj/ 5 (^(i) + B* H {t)) + pl\t\ 2H } = A**, 

where B* H is another independent fBm with Hurst exponent H. Using prop- 
erties of fBm, we see that 

A** £ ( V2^-) 1/H sxgmm{B H (t) + \t\ 2H /2} I 2^) A . 

Thus, the variance of the limiting distribution of n 1 ^ 2 ^ (0* — Oq) is 2 l / H > 2 
times the variance of A, which is a contradiction. 

8.5. Proof of Theorem 3.2. The bootstrap sample is {(Y* ,Xi),i = 1, . . . , n}, 
where the Y* are defined in (7). Letting M* (77) = F* n m v = ± Ya=i[ y * ~ a ~ 
(3Xi(6)] 2 , the bootstrap estimates are 

(29) f)* n = (&*J*n,d*n) = arg minM^) . 

We omit the rate of convergence part of the proof, and concentrate on es- 
tablishing the limit distribution. Also, to keep the argument simple, we will 
assume that fj n — > t]q a.s., but a subsequence argument can be used to bypass 
this assumption. Note that 

(30) C = argmin{n(P; - P n )[m. _i h - m^J + nP n [m. -i h - m^J}, 

hgR3 m-r n m-r » 
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where P n is the probability measure generating the bootstrap sample. Con- 
sider the first term within the curly brackets. Using a similar calculation as 
in (26), 

(31) V^(m- n+r -i h - nifiJ = -2e*[h + (3 n M(0 n , h) + h 2 X(6 n )] + A n , 

where M(0,t) = y/n[X{0 + n~ l/{2H h) - X(6)), with the dependence on n 
suppressed for notational convenience, and a n = y / n(P* — P n )A n = op(l) 
uniformly in h S [— K, K] 3 . Then, using (31), 

Vn(¥* - ^n)[\/^(m^ +r -i h - mfjj] 

(32) = -V^(K ~ PnW{hi + PnML h 3 ) + h 2 X(9 n )}} + a n 
A -2a(Z 1 h 1 + (3 B H (h 3 ) + h 2 \9 \ H Z 2 ) 

in C[—K,K], a.s., where Z±,Z 2 are i.i.d. iV(0, 1) that are independent of 
B H . 

To prove (32), first note that P n [e*{hi + /3 n M(6 n , h 3 ) + h 2 X(6 n )}} = 0, as 
the Xi are fixed and the e* have mean zero under P n . We will need the 
following properties of M(9 n ,t), proved at the end: 



-i n i n 

- T%(§ n ,t) 4 o, -J2%(e n ,t)Xi0 n ) $ o, 

n ^-^ n 

r i=i i=l 

-Y^Mii^s^iie^t) 5 c H {s,t), 

i=l 

uniformly for < K, where C#(s,i) is the covariance function (4) of 

fBm. Now considering (32), by simple application of the Lindeb erg-Feller 
theorem, it follows that 

y/nF^[e*hi] A hN&a 2 ), y/nP*[e*h 2 X0 n )] A h 2 N(0, \6 \ 2H a 2 ), 

a.s. in C[—K,K]. Next consider y/nP n [e*M(9 n ,t)]. The finite-dimensional 
convergence and tightness of this process follow from Theorems 1.5.4 and 
1.5.7 in [35] using the properties of M(6 n ,t) stated in (33). The asymptotic 
independence of the terms under consideration also follows using (33) via a 
similar calculation as in (29). 

To study the drift term in (30), note that 



1 n 

P n m v = - V P n [Y* -a- pX^e)} 2 
n 

i=l 
1 n 1 n 

= - V - Y>„ + PnXi(e n ) + (Sj -E n ) -a- pXi{0)} 



2 



n « — ' n 

i=l 3=1 
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(34) 



1 n 

- V[(a n - a) + (/3 n - p)Xi(§ n ) + P{Xi{e n ) - X;(0)}] 5 
n ' 

1 n 



3=1 

Simple algebra then simplifies the drift term to 



^-^ I ~ In \ n \ n 



i=l 



n 



(35) 



n \ wri I n ^-^ 

i=i \ v / i=1 

v 2— E + 2/i i (A. + ^r) - E 

«=i ' i=i 



(7 \ 1 " 



^hl + hl\e r + ti\h 3 r 

uniformly on [—K,K], where we have used the properties of B(# n ,/i3) in 
(33) and 



1 n 



i=l 



<sup|(P n -P)X(0)|4o. 



Thus, combining (30), (32) and (35), we get C* — > ( in probability. 

It remains to prove (33). We only prove the last part, the other parts 
being similar. For fixed K > 0, consider the function class 

J" n = {B(0,s)B(0,t):0e [0,1], |s| <#,|t| < K}, 

which has a uniformly bounded bracketing entropy integral, and envelope 

l(9,t)\<n a '/ H K 2 ( H - a 'k 2 



F n = sup \M(6,s) 

8,\s\<K,\t\<K 

from the Lipschitz property (23) of order a = H — a' , where < a' < H/2 
and £ has finite moments of all orders. Then 



P< sup 

\\s\,\t\<K 



1 - 

n r-r ' 



,s) - C H (s,t) 



i=i 



> e 
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< p{ sup |(P„ - P)/| > s) < sup |(P n - P)f\ 

<^=J [ . ] (l,^,^(P))(^„ 2 ) 1/2 <n^-i/2^ , 
ty n 

where we use a maximal inequality in Theorem 2.14.2 of [35]. 

Remark. The failure of the nonparametric bootstrap can be explained 
from the behavior of the drift term in (30). In the nonparametric bootstrap, 
we need to find the conditional limit of nP n [m~ +r -ij, — m fi n ] given the data, 
but observe that \/nP n applied to the second term of (26) fails to converge 
in probability. However, when bootstrapping residuals, the drift term in (30) 
becomes nP n [m^ +r „h — mfi n ], and y/nP n applied to the second term in (26) 
vanishes, so the drift term now converges in probability, as seen in (35). 

8.6. Proof of Theorem 4-1- The consistency of 9 n follows using a Glivenko- 
Cantelli argument for the function class J- = {m#(X, Y) = [Y — X(9)] 2 : 9 € 
[0, 1]} and the existence of a well-separated minimum for M; cf. the proof 
of Theorem 2.1. Note that 9q is the unique solution of the normal equation 
M'(0) = and M"(0 O ) > 0, so 

(36) M(9)-M(9 )>d 2 (9,9 ) 

for all 9 in a neighborhood of 9q, where d is the usual Euclidean distance. 
The envelope function Ms = sup |0_0 o |<5l m — m e \ f° r -^<5 = i m e ~ m 9o '■ @ ^ 
[0,1]} has L 2 -norm of order S H , from (21), so Theorem 3.2.5 of [35] ap- 
plied with <f) n (5) = 5 H gives rate r n = n 1 ^ 4-2 ^) with respect to Euclidean 
distance. 

Now write h n = r n (9 n — 6q) = argmin/j G R M n (/i), where 

(37) M n (h)=r 2 n [M n (9 + h/r n )-M n (9 )}, /iGt. 
This gives 

(38) M n (h) = n- H ^- 2H ^G n [Z n (h) 2 } - 2G n [WZ n (h)] + ±M" '(9 )h 2 + A n , 
where A n = o(l) uniformly in h £ [—K,K], for any K > 0, and 

W= f f(t)X(t)dt-X(0 ) + e, 
Jo 

Z n (h) = n H '^- 2H \x(9 a + h/r n ) - X(9 )}. 

Note that Z n (h) =dBH(h) as processes, so, by Donsker's theorem, the first 
term in (38) converges to zero in probability uniformly over [-K, K] . For 
the second term, we claim that 

(39) G n [WZ n (h)]AaB H (h) 
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as processes in C[—K,K], where a 2 = EiW 2 ). Application of the argmin 
continuous mapping theorem will then complete the proof. 

To prove (39), for simplicity, we just give the detailed argument in the 
Brownian motion case, with B = B^j 2 denoting two-sided Brownian motion. 
Consider the decomposition 

(40) G n [WZ n (h)] = G n [(W - W v )Z n (h)} + G n [W v Z n (h)], 
where 

rdo+ri 

(41) W v = / f(t)X(t)dt+(X(9 + v)-X(9o))(F(l)-F(9 + v)), 

Je -v 

F{6) = J f(t) dt, and r] > is sufficiently small so that 1 6*o =1= ^?| < 1- Splitting 
the range of integration for the first term in W into three intervals, and using 
the integration by parts formula (for semimartingales) over the intervals 
[0, 6> - rj\ and [6 + r), 1], we get 

W-W v = / (F(0 O -rj)- F(t)) dX(t) + / (F(l) - F(9 + r,)) dX(t) 

JO JOo+V 

+ e + X(e )(F(l)-F(6 + r,)-l), 

which implies, by the independent increments property, that W — Wn is 
independent of Z n (h) for \h\ < Tjn 1 ^ 3 . Using the same argument as in proving 
(28), it follows that 

G n [{W -W^Znih^Aa^Bih) 
as processes in C[—K,K], for each fixed r\ > 0, where 

-i 2 



E{W - W v f -> E(W 2 ) = E 



f(t)X(t)dt-X(9 ) 



i 2 _ 2 

+ a =a 



as rj — >■ 0. Clearly, a rj B(h) — > aB(h) in C[—K,K] as i] — > 0. If we show that 
the last term in (40) is asymptotically negligible in the sense that, for every 
M > and 5 > 0, 



(42) limlimsupP sup \G n [W v Z n (h)]\ > 5 )= 0, 

V-*Q n^oo \h\<M ' 

this will complete the proof in view of Theorem 4.2 in [4]. Theorem 2.14.2 
in [35] gives 



E 



sup \G n [W v Z n (h)]\ <J U {1,F,L 2 (P)){EF') 

\h\<M J 



2x1/2 



where Jr.i (1, J 7 , L 2 {P)) is the bracketing entropy integral of the class of func- 
tions T = T n ,r] = {W v Z n (h) : \h\ < M}, and F = F n>r) is an envelope function 
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for T. We can take F = \ W n \sup\ h \ <M \Z n (h)\ . By the Cauchy-Schwarz in- 
equality, 

E(F 2 )<(EW^/ 2 (E sup \B(h)\ 4 V /2 < V M, 

V \h\<M ' 

where we have used (21). The bracketing entropy integral can be shown to 
be uniformly bounded (over all r] > and n) using the Lipschitz property 
(23). The previous two displays and Markov's inequality then lead to 

limsupPf sup \G n [W v Z n (h)]\ > 5) < 

n->oo X -\h\<M J 

which implies (42) and establishes (39). 

To establish (39) for general fBm, we apply Theorem 2.11.23 of [35] to 
the class of measurable functions T n = {f n ,h '■ \h\ < M}, where f n ,h{X,£) = 
WZ n (h) and M > is fixed. Direct computation using the covariance of fBm 
shows that the sequence of covariance functions of converges pointwise 
to the covariance function of aBn(h), and the various other conditions can 
be shown to be satisfied using similar arguments to what we have seen 
already. 

8.7. Proof of ( 13). The key step involving the localization of the criterion 
function again relies on the self-similarity of fBm Bh- 

n V( 2 ")(4_0 o ) = argmax(P i -F 2 n )[X(6 + n- 1/{2H) h)-X(6 )] 

h 

i argmax{(Gi - y^G 2 n )[B H (h)] 

h 

+ m(M(0 o + n~ 1/{2H) h) - M(0 O ))} 

A argmax{(l + y/p)B H (h) - c\h\ 2H }, 
h 

where G J n = ^/rTj(Fn — Pj) is the empirical process for the jth sample. 
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